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Abstract 

We introduce a new parameter to discuss the behavior of a genetic 
algorithm. This parameter is the mean number of exact copies of 
the best £t chromosomes from one generation to the next. We be¬ 
lieve that the genetic algorithm operates best when this parameter 
is slightly larger than 1 and we prove two results supporting this 
belief. We consider the case of the simple genetic algorithm with the 
roulette-wheel selection mechanism. We denote by £ the length of the 
chromosomes, by m the population size, by pc the crossover proba¬ 
bility and by pm the mutation probability. We start the genetic al¬ 
gorithm with an initial population whose maximal fitness is equal to 
/o and whose mean fitness is equal to /o. We show that, in the limit 
of large populations, the dynamics of the genetic algorithm depends 
in a critical way on the parameter tt = (/q //o)(1 — Pc)(l — PmY ■ 
If TT < 1, then the genetic algorithm might operate in a disordered 
regime; there exist positive constants /3 and k which do not de¬ 
pend on m such that, for some fitness landscapes and some ini¬ 
tial populations, with probability larger than 1 — 1/m^, before gen¬ 
eration Klnm, the best £t chromosome will disappear, and until 
generation wlnm, the mean fitness will stagnate. If tt > 1, then 
the genetic algorithm operates in a quasispecies regime; there ex¬ 
ist positive constants K,p* which do not depend on m such that, 
for any fitness landscape and any initial population, with probabil¬ 
ity larger than p *, until generation k In m, the maximal fitness will 
not decrease and before generation Aclnm, the mean fitness will in¬ 
crease by a factor These results suggest that the mutation and 
crossover probabilities should be tuned so that, at each generation, 
maximal fitness x (1 — pc)(l — PmY > mean htness. 
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1 Introduction 


A central problem to implement efficiently a genetic algorithm is the ad¬ 
justment of the many parameters controlling the algorithm. If we focus on 
the classical simple genetic algorithm, these parameters are: the population 
size, the probabilities of crossover and mutation. There exists a huge liter¬ 
ature discussing this question. The main message given by the numerous 
works conducted over the years is that, contrary to the initial hopes, there 
exists no universal choice of parameters and the optimal choices depend 
heavily on the fitness landscape. We refer the reader to [16] for a recent 
review. 

Our goal here is to attract the attention on a single parameter, which 
somehow sums up the effects of the various mechanisms at work in a genetic 
algorithm, and which is quite natural from the probabilistic viewpoint. 
The parameter we have in mind is the mean nuirrber of exact copies of 
the best fit chromosomes from one generation to the next. Let us call 
it TT. We suggest that, at any generation, the various operators of the 
genetic algorithm should be controlled in order to ensure that tt is slightly 
larger than 1. Indeed, if tt < 1, then the best fit chromosomes are doomed 
to disappear quickly from the population. If tt > 1, then, with positive 
probability, the best fit chromosomes will perpetuate and one of them will 
quickly become the most recent common ancestor of the whole population. 
It is not desirable that tt is much larger than 1, in order to avoid the 
premature convergence of the algorithm. The optimal situation is when 
the population retains the best fit chromosomes and actively explores their 
neighborhoods. Ideally we would like to have a few copies of the best fit 
chromosomes and a cloud of mutants descending from them. This is why 
we aim at tuning the parameters so that tt is only slightly larger than 1. An 
interesting attempt to induce this behavior is what has been called ’’elitism” 
in the genetic algorithm literature. Under elitism, the best fit chromosomes 
are automatically retained from one generation to the next. However, we 
believe that the resulting dynamics is intrinsically different from the one we 
are aiming at when tuning the parameters so that tt > 1. Indeed, we wish 
to build a probabilistic dynamics which automatically focuses the search 
around the best fit chromosomes, and it might be that, even using elitism, 
the best fit chromosomes are quickly forgotten during the search and none 
of them has a chance to become the most recent common ancestor. 

An advantage with the parameter tt is that we can easily compute sim¬ 
ple bounds in terms of the parameters of the algorithm. This becomes 
particularly true if we perform in addition an asymptotic expansion in one 
or several parameters. If we do so, we can even prove rigorous results 
which strongly support the previous ideas. More precisely, we will consider 
here the case of large populations. This kind of analysis has been previ- 
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ously conducted for the simple genetic algorithm with ranking selection [3] . 
We try here to extend this analysis to the simple genetic algorithm with 
roulette-wheel selection. This task turned out to be very difficult, because 
the dynamics is very sensitive to the variations of the fitness values. Most 
of the results obtained for ranking selection do not hold with roulette-wheel 
selection. We present only two results, which demonstrate that, depending 
on the parameters and the htness distribution of the current population, 
the genetic algorithm can operate either in a disordered regime, where the 
best ht chromosomes are typically lost, or in a quasispecies regime, where 
the best fit chromosomes survive and invade a positive fraction of the pop¬ 
ulation. Our results have their roots in the quasispecies theory developed 
by Eigen, McCaskill and Schuster [7]. We refer the reader to the introduc¬ 
tion of [3] for a quick summary of the development of these ideas, as well 
as for pointers to the numerous relevant references in the genetic algorithm 
literature. 

There are several very interesting works which prove results related to 
ours, even in a more general context. Lehre [ni[I3], Lehre and Dang [5], 
Lehre and Yao [14) have succeeded in deriving upper bounds on the ex¬ 
pected time to reach an optimal solution in a quite general framework cov¬ 
ering a wide range of population algorithms and objective functions. These 
results are more general and complex than ours. The results presented here 
do not yield any estimate on the hitting time of the optimal solutions. Our 
goal is to emphasize the importance of the parameter tt to understand the 
behavior of the algorithm and its ability to take advantage of the best fit 
chromosomes present in the population. To do so we consider only the case 
of the simple genetic algorithm and we focus on its initial behavior in two 
contrasting situations. In order to obtain a sharp and simple criterion, we 
rely also on asymptotic estimates, valid for large populations. Thus our re¬ 
sults are much more specific than those obtained in [gimiTadi], yet they 
are in some sense sharper. An interesting project would be to analyze the 
relationship between tt and the quantities introduced in |13) , like the cumu¬ 
lative selection probability /3 and the reproductive rate uq. Let us mention 
another related works. Neumann, Oliveto and Witt HZ], Oliveto and Witt 
[HI [18] compute precise estimates describing the behavior of the simple 
genetic algorithm with the OneMax function. Eremeev [5] derives polyno¬ 
mial upper bounds for the hitting time of local maxima. Recently, Corns, 
Dang, Eremeev and Lehre [1] generalized the fitness-level technique to any 
variation operator and obtained further bounds on well-known benchmark 
functions. 

We study here the classical simple genetic algorithm with the roulette- 
wheel selection mechanism, as described in the famous books of Holland 
m and Goldberg [5]. We focus on the simplest genetic algorithm but we 
think that similar results might be proved for variants of the algorithm. For 
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instance, our results are not restricted to binary strings and they hold for 
any finite alphabet. Similarly, we deal only with the one-point crossover, 
but our results depend essentially on the probability 1—pM of not having a 
crossover, thus they can be readily extended to other crossover mechanisms. 
We denote by i the length of the chromosomes and by m the population size. 
We use roulette-wheel selection with replacement. We use the standard 
single point crossover and the crossover probability is denoted by pc- We 
use independent parallel mutation at each bit and the mutation probability 
is denoted by pm- We start the genetic algorithm with an initial population 
whose maximal fitness is equal to /q and whose mean fitness is equal to 
/o. We show that, in the limit of large populations, the dynamics of the 
genetic algorithm depends in a critical way on the parameter 

7^ = (/o//o)(1-Pc)(1-Pm/- 

• If TT < 1, then the genetic algorithm might operate in a disordered regime: 
there exist positive constants j3 and n which do not depend on m such that, 
for some fitness landscapes and some initial populations, with probability 
larger than 1 — 1/m^, before generation Klnm, the best fit chromosome 
will disappear and until generation Kin to, the mean fitness will stagnate. 

• If TT > 1, then the genetic algorithm operates in a quasispecies regime: 
there exist positive constants K,p* which do not depend on to such that, 
for any htness landscape and any initial population, with probability larger 
than p* , until generation kIuto, the maximal fitness will not decrease and 
before generation kIuto, the mean fitness will increase by a factor i/^. 
These results suggest that at each generation, the mutation and crossover 
probabilities should be tuned so that 

maximal fitness x (1 — pc)(l ~ PmY > mean fitness . 

It seems therefore judicious to choose “large” values of pm and pc com¬ 
patible with the condition tt > 1. In the generic situation where /q is sig- 
nihcantly larger than /o, this means that the mutation probability should 
be of order l/£; more precisely, the condition tt > 1 implies that 

(■PM + PC < ln(/o//o) • 


2 The model 

In this section, we provide a brief description of the simple genetic al¬ 
gorithm. The goal of the simple genetic algorithm is to hnd the global 
maxima of a fitness function / defined on { 0,1 with values in ]0, -|-oo[. 
We consider the most classical and simple version of the genetic algorithm, 
as described in Goldberg’s book [9]. The genetic algorithm works with a 
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population of m points of { 0,1 }^, called the chromosomes, and it repeats 
the following fundamental cycle in order to build the generation n + 1 from 
the generation n: 

Repeat 

• Select two chromosomes from the generation n 

• Perform the crossover 

• Perform the mutation 

• Put the two resulting chromosomes in generation n + 1 
Until there are m chromosomes in generation n + 1 


When building the generation n + 1 from the generation n, all the ran¬ 
dom choices are performed independently. We use the classical genetic 
operators, as in Goldberg’s book [^, which we recall briefly. 


Selection. We use roulette-wheel selection with replacement. The prob¬ 
ability of selecting the z-th chromosome x{i) in the population x is given 
by the selection distribution defined by 


P (select z-th chromosome in x) 


__ 

f{x{l))-\ - \-f{x{m)) ■ 


Crossover. We use the standard single point crossover and the crossover 
probability is denoted by pc'- 


P 


000-on on-ooi 

100-no 001-111 


000-on 

100-no 


001 - 111 

on-001 


PC 

i-i' 


Mutation. We use independent parallel mutation at each bit and the 
mutation probability is denoted by pm- 

P (0000000 —^ OlOlOOo) = pltil - PM? ■ 


3 The results 

We denote by xq the initial population and by xo{l),... ,xo{m) the m 
chromosomes in the population xq. We denote by /g the maximal fitness 
of the chromosomes in xq and by /g their mean fitness, i.e., 

/o = m.ax fo = — 

l<i<m 771 ^^ 

l<2<m 
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We present two results to illustrate the contrasting behavior of the genetic 
algorithm when tt < 1 and when tt > 1. 

The disordered regime. We consider the fitness function / defined by 


Vue {0,1}^ fiu) 


2 if u = 1 • • • 1 
1 otherwise 


This corresponds to the sharp peak landscape. The chromosome 1 • • • 1 
is called the Master sequence. We start the genetic algorithm from the 
population a:o containing one Master sequence 1 • • • 1 and m — 1 copies of 
the chromosome 0 • • • 0. Thus the optimal chromosome is already present 
in the population. Our goal is to study its influence on the evolution of the 
population. This is a crude model for the following scenario: the genetic 
algorithm has been stuck for a long time, and suddendly, by chance, a 
chromosome with a superior fitness is found; is this new chromosome likely 
to influence the whole population or will it disappear? The next theorem 
describes a situation where the mean fitness of the population is unlikely 
to increase despite the presence of a very well fit chromosome. 


Theorem 3.1 Let tt < 1 be fixed. We suppose that the parameters are 
set so that l = m and (/o//o)(l — Pc)(l — PmY = tt. There exist strictly 
positive constants K,f3,mQ, which depend on tt only, such that, for the 
genetic algorithm starting from Xg, for any m > mg, 

/before generation Klnm, the Master sequence disappears\ 1 

l^until generation nlnm, the mean fitness is < /o(l + J ~ 

The quasispecies regime. We consider an arbitrary non-negative fitness 
function / and we start the genetic algorithm from a population xg such 
that fg > fg . The next theorem describes a situation where the mean 
fitness of the population is likely to increase thanks to the influence of the 
best ht chromosome. 


Theorem 3.2 Let tt > 1 be fixed. We suppose that the parameters are 
set so that (/o/7o)(l - Pc)(l - PmY = TT. There exist strictly positive 
constants which depend on tt and the ratio fg / fg only, such that, for 
the genetic algorithm starting from xg^ for any /, m > 1, 

p / until generation Klnm, the maximal fitness is always > /q 
^ before generation Atlnm, the mean fitness becomes > 
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4 The disordered regime 

In this section, we will prove theorem 13.11 The proof has two main steps. 
First we define a process {Tn)neM which counts the number of descen¬ 
dants of the Master sequence in generation n. We show that, as long as 
T„ < the process (r„)„gN is stochastically dominated by a super¬ 

critical Galton-Watson process. Next we define a process (fV*)„gN which 
counts the number of Master sequences present in generation n. Note 
that N* is in general smaller than r„, because of the mutations and the 
crossovers. Indeed a chromosome might have an ancestor which is a Mas¬ 
ter sequence and be very different from it. We show then that, as long as 
T„ < the process is stochastically dominated by a subcriti- 

cal Galton-Watson process. The bound on (7V*)„gN relies on the previous 
bound on (Tn)nGn- We finally invoke a classical argument from the the¬ 
ory of branching processes to prove that this subcritical Galton-Watson 
process becomes extinct before generation Klnm with probability larger 
than 1 — \/mP. The computations are tedious, because we need to control 
the probabilities of obtaining a Master sequence when applying the various 
genetic operators, and the crossover creates correlations between pairs of 
adjacent chromosomes. 

Let us start with the precise proof. We start the genetic algorithm from 
the population xq containing one Master sequence 1 • • • 1 and m — 1 copies 
of the chromosome 0 • • • 0. Let tt < 1 be fixed. Throughout the proof, we 
suppose that £,pc,Pm satisfy £ = m and 

2(1 - Pc)i^ - PmY = TT. 

We denote by the population at generation n and by T„ the number 
of descendants of the initial Master sequence present in X„. To build the 
generation n -I- 1, we select (with replacement) m chromosomes from the 
population Let us denote by An the number of chromosomes selected 
in Xn which are a descendant of the initial Master sequence. Each of these 
chromosomes is the parent of two chromosomes in generation n+1 (because 
of the crossover operator). Thus we can bound T„+i from above by 2An. 
Conditionally on T„, the distribution of An is binomial with parameters m 
and 

2Tn ^ ^ 

2Tn + m — Tn ~ m 

Thus, conditionally on Tn, the distribution of r„+i is stochastically domi¬ 
nated by the binomial distribution 2B(rn, 2T„/to), which we write 

Tn+l d: 2B(m, —Tn) ■ 

\ m / 


7 



The symbol ^ means stochastic domination (see the appendix). We define 
n = inf { n > 1 : T„ > } , 

and we will compute estimates which hold until time t\. So we fix n > 1 
and we condition on the event that ti > n. There exists to > 0 such that, 
for 0 < t < to, we have ln(l — t) > —2t. Therefore, for m large enough so 
that < to, we have 

(l - ^7’nl{Ti>n}) > exp - 4r„l{.ri>n}) ■ 

We denote by P(A) the Poisson law of parameter A. By lemma lATdl we 
conclude from this inequality that 

Therefore, 


1 { Xl > Tl } 

Tn+ll{ri>n+l} ^ Vfc , 

k^l 

where the random variables (14)fc>i are independent identically distributed 
with distribution twice the Poisson law of parameter 4. Let (Zn)nen be a 
Galton-Watson process starting from Zo = 1 with reproduction law 2P(4). 
We conclude from the previous inequality that 

Vn > 0 Tnl{ri>n} ^ ■ 

We denote by X„(l),..., Xn{m) the m chromosomes of the population Xn- 
Let N* be the number of Master sequences present in the population at 
time n: 


Vn > 0 N* = card {i & {l,...,m} : X„(i) = 1 • • • 1 } . 

We want to control conditionally on the knowledge of N* and Tn- 

A difficulty is that the crossover operator creates correlations between the 
chromosomes of However, conditionally on Xn, the pairs of consec¬ 

utive chromosomes 

(A„+i(l), A„+i(2)),..., (Xn+iim - l),Xn+iim)) 
are i.i.d.. Therefore, we can write as the sum 

m/2 

i^l 
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where Yi is the number of Master sequences in the i-th pair (X„ +i(2z - 
l),Xn+i{2i)). Our strategy consists in estimating the conditional distri¬ 
bution of the Yi’s, knowing the population X„. Conditionally on the 
random variables Yi,l < i < ml2, are i.i.d. with values in { 0, Ij 2 }, yet 
the computations are a bit lengthy and tedious because we have to con¬ 
sider all the possible cases, depending on whether the parents of +i(2i- 
1),X n+i{2,i) belong or not to the progeny of the initial Master sequence. 
So let us focus on one pair of chromosomes, for instance the first one 
(X„+i(l),X„+i(2)). We have to estimate all the conditional probabilities 

P(there is 0,1 or 2 Master sequences in + l(l),-’^n-|-l(2)) I Xn) ■ 

To control these probabilities, we introduce the time T 2 , when a mutant, 
not belonging to the progeny of the initial Master sequence, has at least 
'/i ones. We set 


T2 


inf 


{ 


n > 1 : 


a chromosome of not in the progeny ^ 
of the initial Master sequence has ones / ' 


Let A > 0 be such that 7r/2 > exp(—A). We have then 


(1 — PmY ^ 


WXY) - 2 ^ 


Notice that A depends only on tt, and not on £ or pM- By lemma fA.Sl 
the binomial law B{£,pm) is then stochastically dominated by the Poisson 
law P(A). We will use repeatedly the bound on the tail of the Poisson law 
given in lemma IA.4I 


/ a given chromosome undergoes at least t \ ^ 

V mutations from one generation to the next/ ~ \ t J 


When using this bound, the value of t will be a function of 1. We will 
always take £ large enough, so that the value of t will be larger than A. We 
prove next a bound on T 2 . 


Lemma 4.1 For m >2 and for £ large enough, we have 

P(j '2 < 1 ~ ~ wexp ( — . 

Proof. If T 2 < n, then, before time n, a chromosome has been created 
with at least \/£ ones, and whose genealogy does not contain the initial 
Master sequence. We shall compute an upper bound on the number of 
ones appearing in the genealogy of such a chromosome at generation n. 
Let us define Dn as the maximum number of ones in a chromosome of the 
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generation n, which does not belong to the progeny of the initial Master 
sequence. These ones must have been created by mutation. Let us consider 
a chromosome of the generation n+1, which does not belong to the progeny 
of the initial Master sequence. The number of ones in each of its two parents 
was at most £)„. After crossover between these two parents, the number 
of ones was at most 2T)„. After mutation, the number of ones was at most 


Dn+l < 


2Dr, 


number of mutations occurring on a 
chromosome between generation n and n 


i} 


We first control the last term. Let n > 1 and let us define the event £{n) 
by 


r until generation n, during the mutation process, the number 
I of mutations occurring on a given chromosome is at most 


We have 


P{£{n)) = {l-P 


/a given chromosome undergoes\ \ 
V more than mutations // 


mn 


Using the bound given in lemma [A aJ we obtain that, for > A, 

( / \p \ mn 

i-(^) ) ’ 

whence, for i large enough, 

P[£{n)) > exp ^ — mn exp ( — 

Suppose that the event £{n) occurs. We have then 

V/ce {0,...,n-l} iAfc+i < 2iAfc + £1/4 . 

Dividing by 2 ^“'-i ajjjj summing from fc = 0 to n — 1, we get 

n-l 
k^O ^ 

Therefore, if 2" < and if the event £{n) occurs, then T 2 > n. Taking 
n = (ln£)/5, we obtain the estimate stated in the lemma. □ 

We recall that 

n = inf { n > 1 : r„ > } . 

We set also 

To = inf { n > 1 : N* = 0 } . 
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We shall compute a bound on N* until time r = min(ro, ti,T 2 ). Our goal 
is to show that, for m large enough, the process 

(^nl{T>n})r!,eN 

is stochastically dominated by a subcritical Galton-Watson process. So let 
n > 0 and let us suppose that t > n and that we know the population 
We estimate the probability that exactly one Master sequence is present 
in (X„+i(l),X„+i(2)). We envisage different scenarios, depending on the 
number of descendants of the initial Master sequence among the two parents 
of these chromosomes. 

• First scenario. The two parents are descendants of the Master sequence. 
The probability of selecting such two parents is bounded from above by 

/ 2T„ \2 ^ /2r„\2 ^ 4 

V 2Tn + m — TnJ ~ \ m ) ~ m^/rn 


• Second scenario. Exactly one of the parents is a descendant of the Master 

sequence and a crossover has occurred. The total number of ones present 
in the parents is at most £ + '/I. After crossover, the probability that one 
of the two resulting chromosomes has at least £ — ones is less than 4,1'/I. 
Indeed, this can happen only if, either on the left of the cutting site, or on 
its right, there are at most \/£ zeroes. The most favorable situation is when 
all the ones are at the end or at the beginning of the chromosome which is 
not a descendant of the Master sequence, in which case we have 2\/l. cutting 
sites which lead to the desired result. Otherwise, both chromosomes after 
crossover have at least '/I zeroes, and the probability to transform these 
zeroes into ones through mutations is less than We conclude 

that the probability of this scenario is bound from above by 

/4 , 2N: 

Vv? '-v?-' )2Ni + m-Ni 

• Third scenario. Exactly one of the parents is a descendant of the Master 
sequence and no crossover has occurred. A Master sequence can be created 
from the chromosome not in the progeny of the initial Master sequence, 
this would require £ — \/£ mutations, and the corresponding probability is 
bounded from above by 

/ Ae \ 

The other possibility is that a Master sequence is obtained from the chro¬ 
mosome belonging to the progeny of the initial Master sequence. This 
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chromosome was either a Master sequence, in which case the replication 
has to be exact, or it was differing from the Master sequence, in which case 
some mutations are required. The corresponding probability is bounded 
from above by 


2(1 - Pc) ((1 - PmY + Pm^ 


2n: 


2N* +m-N*' 


• Fourth scenario. None of the parents is a descendant of the Master 
sequence. Until time T 2 , the chromosomes which are not descendants of 
the Master sequence have at most VJ ones. To create a Master sequence 
starting from two such parents require at least i — 2\/£ mutations. The 
corresponding probability is bounded from above by 

yi-2Vi^ 

Putting together the previous inequalities, we conclude that 


P 


there is exactly one Master sequence 
present in (X„ + l(l),-’fn+l(2)) 

Ae\v^' 




< 


4 / 4 „/Ae\ 

^ + (^ + 2^1 


2n: 


my/rn \y/l 

+ 2(1 - pc) ((1 - PmY + Pm^ 


2N*+m- N* 

2n: 


( ) 


2N* + m-N* 




We rewrite the previous inequalities in the case (. = m and for m large. 
Since 2(1 — pm)'^ > tJ", then pM < ln(7r/2). Let e > 0 be such that 

7r(l + 5e) < 1. For m large enough and n < t, we have 


/ there is exactly one Master sequence 
V present in (Ai„+i(l),X„ +i(2)) 


) < —7r(l+£)iV* 
/ m 


Tr, 

N*J - 


Similar computations yield that there exists a positive constant c such that, 
for m large enough and n < r, 


/both Xn+i{i), Xn+l{2) 
V are Master sequences 



< 


c 

nrpl2 


Coming back to the initial equality for iV^+i, we conclude that, for m large 
enough, the law of lV*+il{ r>n+i } is stochastically dominated by the sum 
of two independent binomial random variables as follows: 

( TH 2 \ / 777 c \ 

_^(1 + + 2B[-, . 
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For m large, these two binomial laws are in turn stochastically dominated 
by two Poisson laws. More precisely, for m large enough, 

/2 \^/2/ \ 

(l - - 7 r(l + 2e)Kl{r>n}) > exp ( - tt{1 + 3e)7V:i{,>„} j , 

(l — > exp(—e). 

Lemma [A. 31 yields then that 

^ iP(7r(l + 3e)iV:i{,>„})+2iP(e). 

The point is that we have got rid of the variable m in the upper bound, so we 
are now in position to compare with a Galton-Watson process. 

Let {Y^)n>i be a sequence of i.i.d. random variables with law 7^(7r(l+ 3e)), 
let {Y")n>i be a sequence of i.i.d. random variables with law 7^(e), both 
sequences being independent. The previous stochastic inequality can be 
rewritten as 

N:+lhr>n+l} ^ { E Ytj+2Y;'. 

k>l 

This implies further that 

^ril{x>r.} 

iv:+ii{.>„+i} ^ E {y' + 2yi'). w 

k>l 

Let z/* be the law of F/ + 21"" and let (Z*)n>o be a Galton-Watson process 
starting from Zq = 1 with reproduction law v*. We prove finally that, for 
m large enough, 

Vn>0 + F:. 

We suppose that m is large enough so that the stochastic inequality (*) 

holds and we proceed by induction on n. For n = 0, we have 

A^0*l{r>0} = 1 < F* = 1. 

Let n > 0 and suppose that the inequality holds at rank n. Inequality (*) 
yields 

K^{'r>r^} Z* 

,>„+!} + E + 2^") ^ E + 2^/c") = K+i • 

k>l k>l 

Thus the inequality holds at rank n and the induction is completed. More¬ 
over we have 


E{iy*) = E{y; + 2Y(') = 7 r(l + 5e) < 1. 
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Thus the Galton-Watson process {Z*)n>o is subcritical. 

We complete now the proof of theorem l3.II Let k,ci >0 be constants 
associated to the Galton-Watson process {Zn)n>o as in proposition IA.7I 
We suppose that k < 1/5, so that we can use the estimate of lemma HTTl Let 
c > 0 be a constant associated to the subcritical Galton-Watson process 
„>o as in lemma ETbl We have then 

P{to > Klnm) < 

P{tq > K\nm, t < K\nm) + > 0, r > Atlnm) 

< P{ti < ulnm) + P{t 2 < /tlnm) + -P(-^*Kinmj > O) 

< —— + 1 — exp(—mexp ( — + exp(—c* [k InmJ). 

This inequality yields the estimate stated in theorem 13.II 

5 The quasispecies regime 

In this section, we will prove theorem 13.21 We start the genetic algorithm 
with an initial population whose maximal fitness is equal to /q and whose 
mean htness is equal to /q. For x = ... ,x(m)) a population, we 

define N{x, /q ) as the number of chromosomes in x whose fitness is larger 
than or equal to /q : 

= card {i € {1,... ,m} : fix{i)) > /o* } • 

We denote by Xn the population at generation n and by X„(l),..., Xn(m) 
the m chromosomes of Xn- We define a stopping time f by 

T = inf |n> 1 : ^(^/(W„(1))H-h/(X„(m))) > v^/o | ■ 

Our goal is to control the time r, more precisely we would like to prove 
that T is less than Klnm with high probability. Unfortunately, the process 
(^{N{Xn, fo))n>o complicated, it is not even a Markov process. Our 

strategy is to construct an auxiliary Markov chain which is considerably 
simpler and which bounds ((iV(X„, /o ))„>q from below until time r. The 
production of chromosomes with fitness larger than or equal to /q from one 
generation to the next can be decomposed into two distinct mechanisms: 

• chromosomes which are an exact copy of one of their parents; 

• chromosomes which have undergone mutation or crossover events. 

We will bound from below the process (^{N{Xn, fo)) by neglecting the 
second mechanism. The key point is that the law of the number of chro¬ 
mosomes created in generation n -|- 1 through the first mechanism depends 
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only on the value A^(X„,/g) and not on the detailed composition of the 
population at time n. Therefore we are able to obtain a lower process which 
is a Markov chain. We denote this process by {Nn)n>o- We proceed next 
to its precise definition. Suppose that in the generation n, we have i chro¬ 
mosomes of fitness larger than or equal to /g, and that the mean fitness 
is still below that is, we condition on the event N{Xn, fg) = i and 

T > n. Let us look at the hrst pair of chromosomes of generation n + 1. 
The probabibility to select from the generation n a chromosome of fitness 
larger than or equal to /g is at least i/g /(my^/o). The probability that 
no crossover has occurred is 1 — pc- The probability that no mutation has 
occurred on a given chromosome is {1—pmY- Thus the probability that the 
first chromosome of the generation n -|- 1 is an exact copy of a chromosome 
of generation n having fitness larger than or equal to /g is at least 

{1 - pc){l - PmY ■ 

my/TT /g 

However the crossover creates correlations between adjacent chromosomes, 
so the distribution of cannot be taken simply as a binomial law. 

Conditionally on the event that N{Xn, /o) = * and T > n, a correct lower 
bound on N{Xn+i, /g) is given by the sum 


m/2 

^Zfc(T2fe-l+>2fe), 

k=0 

where Zi,..., Z ^/2 are Bernoulli with parameter 1 — pc, and Yi,..., Ym 
are Bernoulli with parameter 

= -^^{1-pmY 

my/TT Jo 

and they are all independent. The variable Zk is 1 if there was no crossover 
between the chromosomes of the fc-th pair and 0 otherwise. The variable Yk 
is 1 if the fc-th chromosome selected has fitness larger than or equal to /g 
and it is not affected by any mutation. We obtain that, for j € { 0,..., m }, 


P 


(iV(X„+l,/g*) >J 


fV(X„,/g*)=* 
T > n 


ml2 


> 


p(^Zfc(Y2fc-i+T2fe) > j) ■ 


k=0 


We compute the righthand side and we are led to define the transition 
matrix of the Markov chain (-^ra)„>g by setting, for i,j G { 0,..., m }, 

P(iV„+i=j|iV„ = i) = 
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’^/^ / / \ / \ 

s \ij ^ 

The above inequality can then be rewritten as: for i,j S { 0,..., m }, 
p(iV(X„+i, /*) > 3 I = *) > P(iVn+i > J I iV„ = *) . 


From lemma IA.21 this implies furthermore that, for any non-decreasing 
function </>: N —>■ R, for i S { 0,..., m }, 




Wn,/o*) 
T > n 


> E{c^{Nr,+i)\Nn=i). (o) 


Let us focus a bit on the the Markov chain (-^ra)„>Q- Its state space is 
{ 0,..., m }. The null state is an absorbing state because we neglect the 
mutations for producing chromosomes of fitness at least /g. A key point 
to exploit inequality (o) is the following result. 


Proposition 5.1 The Markov chain is monotone. 

Proof. The definition of monotone Markov chain is recalled in appendix 
(see definition lA.il) . The easiest way to prove the monotonicity is to build 
an adequate coupling. For n G N and k < m/2, let be a Bernoulli 
random variable with parameter 1 — pc and be two random 

variables whose distribution is uniform over [0,1]. We suppose that all the 
above random variables are independent. For i G { 0,..., m }, we define 
Nq = i and 

mjl 

Vn > 0 ^ + l{c/"fc<^-(^n)}) • 

k=0 

This way all the chains G {0,...,m}, are coupled and a 

straightforward induction yields that 

Vi <j VnGN Nl< Nl. 


This yields the desired conclusion. □ 

We are interested in the process ((Af(Ar„,/g until time r. In order 

to prove a convenient stochastic inequality, we mil work with the process 
(^n)„>g defined by 


Vn > 0 


N* = 




if r > n 
if r < n 
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Proposition 5.2 We suppose that the Markov chain (iV„)„gN starts from 
Nq = 1. For any n > 0, we have the stochastic inequality 


For the above statement, we work with the product order on 

(zq, . . . ^in) ^ (jOi ■ ■ • ; jn) ^0 ^ Jo; • • ■ ; ^ Jn ■ 

The stochastic domination inequality stated in proposition [5^ means that: 
for any non-decreasing function (j> : —>■ we have 

E(cj,{N*,...N:)) > e(^^{No,...N,,)) . 

Proof. We proceed by induction on n. For n = 0, we have 

N* = N{Xo,fS) >^ = No. 


Suppose that the result has been proved until rank n for some n > 0. Let 
(j) : N"+2 ^ K+ be a non-decreasing function. We write 




^ p(fV*=io,...,iV:=z„) 

xeU{n*,...,n:+,) 


N*=zo,...,K 



Let io, ■ ■ ■ ,in be fixed. Suppose first that in < m. The event { N* = in} 
implies that T > n and N* = N{Xn, /o). The map 

i e { 0,..., m } !->• (^(zo, ...,in,i) 

is non-decreasing. Using the stochastic inequality (o), we obtain 


e(^{ns ,I TV* = zo,...,iv: = In) = 

£;(<^(zo,..., z„, NiXn+um) I NiXo, /o*) = io,..., iV(X„, f*) = in) 

— E(^}ioi ■ • ■ lini Nn-\-l} Nn — in) • 


Let us define a function ^ —>■ IR+ by setting 




'lp{io, ...,in) = E\^(j){io, ...,in, Nn+l) 

If in = TO, then we have also 

^('/'(^o, ■ • ■ ,^n+i) No =io,...,N*=in) = (j){io,...,in-i,m,m) 
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> . . ,in) ■ 

From the previous inequalities, we conclude that 

Since the function ip is non-decreasing on N”+2 and since the Markov chain 
{Nn) n>o is monotone (by Droposition l5.ll) . then the function ip is also non¬ 
decreasing on Now, the induction hypothesis yields that 

e{^{N*,...,N:)) > E{^{No,...,Nr,)) = F;(</.(iVo,...,iV„,iV„+i)) 

and the induction step is completed. □ 

If N{Xn, /o) > mlypa, then necessarily 

l(/(X„(l)) + ...+/(X„(m))) >^/* > 

and thus T < n. The above coupling inequality implies therefore that 

P(t <n) > P(3k < n N{Xk,fo) > 

> P(3k < n Nk > raj^pK) . 

We study next the dynamics of the Markov chain (iV„)„>o on { 0,..., m }. 
Our goal is to prove that, for some k > 0, with a probability larger than 
a constant independent of m, this Markov chain will reach a value strictly 
larger than before time rein to. Let us explain briefly the heuristics 

for this result. The transition mechanism of the chain is built with the 
help of i.i.d. Bernoulli random variables, some of parameter 1 — pc and 
some of parameter em(i), i € {0, ...,to}. The typical number of pairs 
of chromosomes with no crossover from one generation to another is (1 — 
Pc)ni/2 and we can control accurately the deviations from this typical 
value. For i small compared to to, the parameter em{i) is of order ctex i/to, 
thus, conditionally on the event that 7V„ = i, the distribution of N^+i is 
roughly the binomial law of parameters to(1 — pc) and cte x ijm. In this 
regime, it can be approximated adequately by a Poisson law of parameter 

w(l -Pc)f-m{i) ^ *V^- 

We conclude that, as long as is small compared to to, we have 

E{Nn+l) ~ \/TTE{Nn) ■ 


18 


In the next proposition, we derive a rigorous estimate, which shows indeed 
that the Markov chain {Nn)n>o is likely to grow geometrically until a value 
larger than mjy^. The proof is elementary, in the sense that it relies 
essentially on two classical exponential inequalities (which are recalled in 
the appendix). This proof is an adaptation of the proof of proposition 6.7 
in [3|. In Droposition IS.dl we shall then bound from below the probability of 
hitting a value larger than mj^/TT before time Kin to and this will conclude 
the proof of theorem 13.21 


Proposition 5.3 Let tt > 1 be fixed. There exist p > 1, cq > 0, toq > 1, 
which depend on tt and the ratio fo/fo only, such that: for any set of 
parameters i,pc,PM satisfying tt = (/o//o)(l — Pc){^ — PmY, we have 

Vto > Too Vi < mj^pK P( < pi | iV„ = i) < exp(—coi). 

Proof. We recall that, conditionally on 7V„ = i, the law of is the 
same as the law of the random variable 


2S„ 


k=l 


■ 


where Bn is distributed according to the binomial law — pc), 

the variables Y^, k € N, i G { 1,... ,to}, are Bernoulli random variables 
with parameter em{i), and all these random variables are independent. Let 
e > 0 be such that pK{X — 2e) > 1 and let 


l(m, e) 




+ 1 +jil-Pc)e. 


For TO large enough, we have 

TT) F TT) 

l{m,e) < y(l-Pc)(l - 2 ) + 1 < y(l-Pc)- 


Let p be such that 1 < p < v^(l ~ 2e). We have 


P{ Nn+i < pi \Nn 


^ ^ 71 

>) = p{Y. 


Y't < pi) 


2l{rrL,e) 

< P{Bn<l{m,e)) +P(^ ^ Yfe* < pi^ 

k=l 


We control the first probability with the help of Hoeffding’s inequality (see 
the appendix). The expected value of Bn is to(1 —pc)l2 > l{m,e), thus 

P{Bn < l{'m,e)) < exp - ^(y(l - Pc) - K'm,e)^ ) . 
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Recall that 1— pc > /o//o ■ For m large enough, we have 


2^ V , ; - 2^ -'^o;2 -4/0* “8/0* 

It follows that, for m large enough, 

P(B„ < Z(m,e)) < exp(- ) ■ 

Let us try to apply also Hoeffding’s inequality to control the second prob¬ 
ability. We get 

P[ < P^) < exp — (^2Z(m,e)e™(z) - pi) ^ 


k=l 


Now 


l{m, e) 

*/o 


2/(m,£:)e:^(i) > 2—(1-pc)(l - e) 

I mVTT /o 

whence, using the hypothesis on p, 

2l{m,€) 


= {1-PmY = {l-e)iy/Tr, 


^k < pi) < exp(- 


fc=i 


m 


This inequality becomes useful only when i of order Sm for some 5 > 0. 
For smaller values of i, we must proceed differently in order to control this 
probability. Thus we decompose the sum into i blocks and we use the 
Chebyshev exponential inequality. Each block follows a binomial law, and 
we bound the Cramer transform of each block by the Cramer transform 
of a Poisson law having the same mean. More precisely, we choose for the 
block size 


b = 


2l{m,e) - —{l-pc)e 


-hi 


J. 


and we define the sum associated to each block of size b: 


bj 


Vj e { 1 ,... ,i} 


^'= E 

k=b(j-l) + l 


n- 


Notice that Y[ follows the binomial law with parameters b and em(*)- We 
will next estimate from below the product bem{i)- By the choice of b and 
/, we have 

1 


b > -(2l{m,e) - ^(l-pc)e) 
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Krn,e) > ^{l-pc)(l - , 







whence 


Ttl 

b > —{1 - pc){l - e) 

and 

E{Y() = bemii) > y/n{l - e) > p. 

Let 5o > 0 be such that i5o < (1 — pc)e/4- Suppose that i < Som. We 
have also that 

TTl 

bi < 2l{m, e) — ^(1 ~ Pc)£ + i 

TTL 

< 2l{m,e) — —(\—pc)£ + bQm < 2l{m,e). 

Using the Chebyshev exponential inequality (see the appendix), we have 
then 

2l(m,e) bi 

p{ E ^k<p^) < p{J2^k<pi) 

k^l k^l 

i i 

<p{j2^'<pi) < E ^ -P^) < exp ( - iMri i-p)) , 

i=i i=i 

where is the Cramer transform of — U/. Let Y” be a random variable 
following tiie Poisson law of parameter b£m{i)- We shall use the following 
lemma to compare the Cramer transforms of — F/ and —Y". By lemma lXTsl 
we have 

h*_Yi{-p) > ^-Yl'{-P) = - p + h£m{i). 

The map 

A !-)■ pin - p + A 

is non-decreasing on [p, +oo[ and b£m{i) > ~ ^)i fh'i® 



Let us denote by cq the righthand quantity. Then cq is positive and it 
depends only on p, tt, /q //q and e. Finally, we have for m large enough, 

2l{m,e) 

Vi e { 1,..., L<5omJ } P(^ E < P*) < exp(-coi), 

k^l 

whence 

p{ Nn+i < pi\Nn=i) < exp ( - ^ (^) ) ■ 
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For i such that <i< m/y^, we had obtained 

P{Nn+i < pi \Nn = i) ^ ( “ ■^(^) ) ■ 

Let 77 g] 0, 1[ be small enough so that pco < tte^Sq and, for m large enough, 

(‘ -“p(-’’ t)) ■ 

For m large enough and i G { 1,..., [JomJ }, we have 
P{Nn+i < pi \N„ = i) 

< exp - 77 ^) (^1 - exp - ??y)) + exp ( - 77 ^ 0 ) 

< exp ( - 77 —) . 

For m large enough and S^m < i < mj we have also 
P{Nn+i < pi\Nn = i) 

< exp (1 - exp - ?7y)) + exp ( - 7777700 ) 

< exp ( - 77 ^) . 

These inequalities yield the claim of the proposition. □ 

We define 

T* = inf {77 > 0 : Nn > raj\pK'\ . 

Proposition 5.4 Let tt > 1 be fixed. There exist k > 0 and p* > 0 which 
depend on tt and the ratio /q //g only such that 

Vm >1 P (t* < k In 777 1 A^o = 1 ) > p* . 

Proof. Let us define 

To = inf { 77 > 1 : Nn = 0 } . 

Recall that 0 is an absorbing state. Thus, if the hitting time of m is finite, 
then necessarily, it is smaller than the hitting time of 0. It follows that 

P(t* < Kln 7771 TVg = 1 ) = P(t* < Kln 777, r* < Tg I No = l) . 
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It is annoying to work with a Markov chain which has an absorbing state, 
S(^we first get rid of this problem. We consider the modified Markov chain 
iNn)n>o which has the same transition probabilities as (iV„)„>o, except 
that we set the transition probability from 0 to 1 to be 1. The event we 
wish to estimate has the same probability for both processes, because they 
have the same dynamics j3utside of 0. So, from now onwards, we work 
with the Markov chain {Nn)n>o, which is irreducible. Let p > 1, cq > 0, 
Wo > 1 be as given in proposition 15.31 For fc > 0, let be the first time 
the process {Nn)n>o hits k: 

Tk = inf { n > 0 : Nn = fc } . 

Let £ be the event: 

£ = { Vfc < ml\fn iV^fc+i > pA: } . 

We claim that, on the event f, we have 

Vn < r* iV„+i > piV„ . 


Let us prove this inequality by induction on n. We have Ti = 0 and 
Ni > pNo, so that the inequality is true for n = 0. Suppose that the 
inequality has been proved until rank n < r*, so that 

Vfc < n Nk+i > pNk ■ 

This implies in particular that 

Nq < Ni < ... < Nn < 

Suppose that iV„ = i. The above inequalities imply that Tt = n and 

^Ti + l = Nn+l > pNn , 


SO that the inequality still holds at rank n + 1. Iterating the inequality 
until time r* — I, we see that 

Nr--1 > p"*"^ 

Moreover Nr*-i < mj thus 


r 


* 


< 1 + 


In 771 
In p 


Let mi > 1 and k > 0 be such that 


Vm > mi 


1 + 


Inm 
In p 


< K In m . 


23 




The constants mi, k depend only on p, and we have 

P(t* < Klnm, t* < tq I Nq = l) > P{£) ■ 

We shall use the following lemma to bound P{S) from below. To avoid too 
small indices, we write T{i) instead of T^. 

Lemma 5.5 Let k G { 1,..., m } and let ii,... ,j^k be k distinct points 
of { 1,..., m }. The random variables Nxi^+i, ■ • ■, are independent. 

Proof. We do the proof by induction over k. For fc = 1, there is nothing 
to prove. Let k > 2 and suppose that the result has been proved until rank 
k — 1. Let ii,... ,ik he k distinct points of { 1,..., m}. Let ji ,..., jk he k 
points of { 1,..., m }. Let us set 

T = min { T(ii) : 1 < Z < fc } . 

We denote by (p(i, j))o<ij<m the transition matrix of the Markov chain 
{Nn)n>o- Using the Markov property, we have 

P{Nt{G) + 1 = jl, ■ ■ ■ , ■^T(jfc) + l = jk) 

= ^ p(7VT(ii)+i = ji,...,ZVT(4)+i = jfc,r = r(i/)) 

i<i<fc 

= E P{NTi^,)+l=jl,.■.,NTi^,)+l=Jk\T = T{^l))P{T = T{^l)) 
l<l<k 

= ^ P(Vh^Z NTi^,)+l=Jh,N,=Jl\No=ll)P{T = T{^l)) 

l<l<k 

= P{'>'Uji)Piyh^l Nt(i^)+i= jh\NQ = ji)P{T = T{ii)) . 

l<l<k 

We use the induction hypothesis: 

Piyh^l NT{i^)+i= 3h\NQ= jl) = Wp{ih,jh)- 

h^l 


Reporting in the sum, we get 
P{NT{h) + l = jl, ■ • ■,NT{i^) + l = jk) = 

= E n Pi'^h,jh)P{T = T{ii)) = p{ih,jh)- 

l<l<kl<h<k l<h<k 

This completes the induction step and the proof. □ 


24 


Using lemma [53] and proposition 15.31 we obtain, for m larger than mg and 

mi, 

P(£) > n ^(^n+i > pk) 

l<k<m 

= n {^-P{Ni<pk\Ng = k)) 

l<k<m 

oo 

> n “ exp(-co/i:)) > n “ exp(-coA;)) . 

l<fc<m k—1 

The last infinite product is converging. Let us denote its value by pi. Let 
also 

P 2 = min I P{t* < Klnm | Ng = l) : m < max(mo, m-i) |. 

The value p 2 is positive and the inequality stated in the proposition holds 
with p* = min(pi,p 2 )- D 


6 Conclusion 

Our goal is to put forward the importance of the parameter tt. To this 
end, we have studied the behavior of the simple genetic algorithm in two 
contrasting regimes. In the first case, we take £ = m and we run the 
genetic algorithm on the sharp peak landscape with an initial population 
containing exactly one Master sequence and m — 1 chromosomes very far 
from it. The parameters of the genetic algorithm are set so that tt < 1. We 
showed that the Master sequence is very likely to be lost and that the mean 
fitness does not increase significantly. In the second case, we consider an 
arbitrary htness landscape and we start with a population such that tt > 1. 
We showed that the mean fitness is likely to increase. From these results, 
we extrapolate a simple practical rule. We believe that the parameters of 
the genetic algorithm sould be tuned so that tt is slightly larger than 1, 
that is, at each generation, we should have 

maximal fitness x (1 — pc')(l — PmY > mean fitness . 

For instance, one could use an adaptive scheme of the parameters, as sug¬ 
gested in [5]. Of course this conclusion has to be taken with care. We hope 
that it will be further examined in future research works. On the empirical 
side, it should be tested numerically on various problems. On the theoret¬ 
ical side, it might be extended to variants of the simple genetic algorithm, 
as well as to the evolutionary computation framework. 
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A Appendix 

Monotonicity. We first recall some standard definitions concerning mono¬ 
tonicity and coupling for stochastic processes. A classical reference is 
Liggett’s book [15], especially for applications to particle systems. In the 
next two definitions, we consider a discrete time Markov chain {Xn)n>o 
with values in a space E. We suppose that the state space E is finite and 
that it is equipped with a partial order <. A function / : f —>■ K. is non¬ 
decreasing if 

Vx,y €E X <y ^ f{x) < f{y). 


Definition A.l The Markov chain (A„)„>o is said to be monotone if, for 
any non-decreasing function /, the function 

xef ^ A(/(A„)|Ao=x) 

is non-decreasing. 

A natural way to prove monotonicity is to construct an adequate coupling. 
A coupling for the Markov chain (A„)„>o is a family of processes {X^)n>o 
indexed by a; G £1, which are all defined on the same probability space, and 
such that, for a; G f, the process (A^)„>o is the Markov chain (A„)„>o 
starting from Xq = x. The coupling is said to be monotone if 

yx,yGE x<y ^ Vn > 1 . 

If there exists a monotone coupling, then the Markov chain is monotone. 

Stochastic domination. Let y, v be two probability measures on M. We 
say that v stochastically dominates y, which we denote hy y ^ v, if for any 
non-decreasing positive function /, we have y{f) < v{f). 

Lemma A.2 If y, v are two probability measures on N, then y is stochas¬ 
tically dominated by v if and only if 

Vi G N -|-oo[) < -|-oo[). 

Proof. Let / : N —>■ R’*' be a non-decreasing function. We compute 

y{f) = ^y{i)fii) = ^ -foo[) --f 1,-foo[))/(i) 

i>0 i>0 

= /(o)++°°[)(/(*) - /(* -1)) • 

2>1 
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Under the above hypothesis, we conclude indeed that ^(/) < □ 

Lemma A.3 Let n > 1, p € [0,1], A > 0 be such that (1 —p)” > exp(—A). 
Then the binomial law B{n,p) of parameters n,p is stochastically domi¬ 
nated by the Poisson law V{\) of parameter A. 

Proof. Let Xi, ..., A„ be independent random variables with common 
law the Poisson law of parameter — ln(l — p). Let U be a further random 
variable, independent of Ai,..., A„, with law the Poisson law of parameter 
A — nln(l — p). Obviously, we have 


Y Xi Xn ^ min(Ai, 1) min(A„, 1). 

Moreover, the law of the lefthand side is the Poisson law of parameter A, 
while the law of the righthand side is the binomial law B{n,p). □ 


Lemma A. 4 Let A > 0 and let U be a random variable with law the 
Poisson law P(A) of parameter A. For any t > A, we have 




Proof. We write 


P{Y>t) = ^^exp(-A) = ^^exp(-A)A‘ 

k>t k>t 

k>t 


Let y be a random variable following the Poisson law P(A). For 
we have 


(t)‘ 

□ 

any t G K, 


\ ^ 

Arit) = lnE{exp{tY)) = — exp(—A -I- kt)j = A( exp(t) — l) . 

k=0 

For any a, t G K, 

Aavit) = Ayiat) = A(exp(at) — l). 

Let us compute the Fenchel-Legendre transform A*^. By definition, for 
a; G K, 

A*y(a:) = sup [tx — A(exp(at) — l)) • 

iGR ^ 
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The maximum is attained at t = (l/a) ln(x/(Aa)), hence 




aY 




(x) = — In 
a 


\Xa/ 


+ X. 


Lemma A.5 Let p G [0,1] and let n > 1. Let A be a random variable 
following the binomial law B{n,p). Let T be a random variable following 
the Poisson law V{np). For any a G R, we have > A*y. 

Proof. For any t G R, we have 

Ax(t) = ln£’(exp(tA)) = nIn (l — p + pexp(t)) < np(exp(t) — l) . 
For any a, t G K, 

Aax(t) = Ax (at) < np{exp{at) - l) . 

We recall that, if Y is distributed according to the Poisson law of parame¬ 
ter A, then 

Vt G M AY(t) = A(exp(t) — 1). 

Thus, taking A = np, we conclude that 

Vt G M ^ax{t) < Aavit) ■ 

Taking the Fenchel-Legendre transform, we obtain 
Vx G R A:x(x) > A*Mx) 


as required. □ 

Hoeffding’s inequality. We state Hoeffding’s inequality for Bernoulli 
random variables m- Suppose that A is a random variable with law the 
binomial law B{n,p). We have 

Vt < np P{X < t) < exp ( — — {np — . 

Chebyshev exponential inequality. Let Ai,...,A„ be i.i.d. random 
variables with common law p. Let A be the Log-Laplace of p, defined by 


Vt G R A{t) = In ^ y exp(ts) dp{s)j . 


Let A* be the Cramer transform of p, defined by 

Vx G R A*(x) = sup (tx — A(t)) . 

iGR 
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We suppose that fi is integrable and we denote by m its mean, i.e., m = 
J^xdfi{x). We have then (see for instance [6]) 

Va; > m + • • • + ^n) > < exp ( — nA*{x)) . 

Let F be a random variable following the Poisson law For any t G M, 

we have 

CSO yU 

Ayit) = ln£'( exp(tF)) = In ^ ^ — exp(—A + = A(exp(t) —l). 

fc =0 

For any a, t G K, 

Aavit) = Ayiat) = A(exp(at) — l). 

Let us compute the Fenchel-Legendre transform A*^. By definition, for 
a; G K, 

A*y(x) = sup [tx — A(exp(at) — l)) • 

The maximum is attained at t = (1/a) ln(a;/(Aa)), hence 
A:.(x) = ^ln(f)-^ + A. 

a \\a / a 

Galton—Watson processes. Let v be probability distribution on the 
non-negative integers. Let (F„)„gN be a sequence of i.i.d. random variables 
distributed according to v. The Galton-Watson process with reproduction 
law /nu is the sequence of random variables (Z„)„gp} defined by Zq = 1 
and 

Zn 

Vn € N Zn-\-i = ■ 

/t-i 

It is said to be subcritical if E{v) < 1 and supercritical if £{ 1 /) > 1. The 
following estimates are classical (see for instance m)- 

Lemma A.6 Let (F„)„gN be a subcritical Galton-Watson process. There 
exists a positive constant c, which depends only on the law v, such that 

Vn > 1 P{Zn > 0 ) < exp(—cn). 

Proposition A.7 Let {Zn)n^fi be a supercritical Galton-Watson process 
such that E{v) is finite. Let 

n = inf { n > 1 : } . 

There exist k > 0, ci > 0, ni > 1, such that 

'in'>n\ P (ti < k In n) < . 
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Proof. We have, for fc > 0 


P(ri = k) < P(ti > k,Zk> 

< P{Zk > rii/4) < n-i/^P(Zfc) < n-i/^(P(i/))'= . 

We sum this inequality: for n > 1, 

P(n <») < 

fe =0 

We choose n positive and sufficiently small, we apply this inequality with 
/t In n instead of n and we obtain the desired conclusion. □ 
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